The Infona portal uses cookies, i.e. strings of text saved by a browser on the user's device. The portal can access those files and use them to remember the user's data, such as their chosen settings (screen view, interface language, etc.), or their login data. By using the Infona portal the user accepts automatic saving and using this information for portal operation purposes. More information on the subject can be found in the Privacy Policy and Terms of Service. By closing this window the user confirms that they have read the information on cookie usage, and they accept the privacy policy and the way cookies are used by the portal. You can change the cookie settings in your browser.
There are a number of efforts towards building applications that leverage temporal information in documents. The demonstration of our NEAT (News Exploration Along Time) prototype system that we propose here, is an attempt towards building an intuitive and exploratory interface for search results over large news archives using timelines. The demonstration uses the New York Times Annotated Corpus as...
All kinds of Web sites invite visitors to provide feedback on comment boards. Typically, submitted comments are published immediately on the same page, so that new visitors can get an idea of the opinions of previous visitors. Popular multimedia items, such as videos and images, frequently get up to thousands of comments, which is too much to be read in reasonable time. I.e., visitors read, if at...
The objective of Eurogene is to collect a critical mass of educational content in the field of human genetics in nine European languages and to build a platform that will support the retrieval, sharing and navigation over the learning content. The Eurogene platform is already operational and is being used by the genetics community. In this paper, a part of the Eurogene platform related to the retrieval...
Netspeak is a Web service which helps writers in finding alternative expressions for what they want to say. It provides a large index of writing samples in the form of n-grams, n ≤ 5, along with an efficient means to retrieve them by the use of wildcard queries. When in doubt about a phrasing, a user can get additional evidence by retrieving samples that match a given context. The figure below shows...
Over the last two decades, Interactive Information Retrieval (IIR) has established a new direction within the long tradition of IR that introduces the user at its center and poses new challenges for system evaluation. IR systems can improve performance by utilizing information about the entire interactive process of search. This approach has so far only been initially explored [1,2] with much potential...
In this talk we will discuss where Web search may be heading, focusing on a number of large-scale research projects that are trying to develop the “next big thing” in Web search. We will consider some important recent initiatives on how to improve the quality of the Web search experience by helping search engines to respond to our individual needs and preferences. In turn, we will focus on some innovative...
In the last ten years natural language processing (NLP) has become an essential part of many information retrieval systems, mainly in the guise of question answering, summarization, machine translation and preprocessing such as decompounding. However, most of these methods are shallow. More complex natural language processing is not yet sufficiently reliable to be used in IR. I will discuss how new...
Image annotation, the task of automatically generating description words for a picture, is a key component in various image search and retrieval applications. Creating image databases for model development is, however, costly and time consuming, since the keywords must be hand-coded and the process repeated for new collections. In this work we exploit the vast resource of images and documents available...
In this article we introduce a visualisation technique for analysing relevance and interaction data. It allows the researcher to quickly detect emerging patterns in both interactions and relevance criteria usage. The concept of “relevance criteria profile”, which provides a global view of user behaviour in judging the relevance of the retrieved information, is developed. We discuss by example, using...
Pseudo-Relevance Feedback (PRF) assumes that the top-ranking n documents of the initial retrieval are relevant and extracts expansion terms from them. In this work, we introduce the notion of pseudo-irrelevant documents, i.e. high-scoring documents outside of top n that are highly unlikely to be relevant. We show how pseudo-irrelevant documents can be used to extract better expansion terms from the...
A promising way to accelerate similarity search is semantic hashing which designs compact binary codes for a large number of documents so that semantically similar documents are mapped to similar codes within a short Hamming distance. In this paper, we introduce the novel problem of co-hashing where both documents and terms are hashed simultaneously according to their semantic similarities. Furthermore,...
This paper explores how to predict query difficulty for contextual image retrieval. We reformulate the problem as the task of predicting how difficult to represent a query as images. We propose to use machine learning algorithms to learn the query difficulty prediction models based on the characteristics of the query words as well as the query context. More specifically, we focus on noun word/phrase...
We present two methods for estimating replacement probabilities without using parallel corpora. The first method proposed exploits the possible translation probabilities latent in Machine Readable Dictionaries (MRD). The second method is more robust, and exploits context similarity-based techniques in order to estimate word translation probabilities using the Internet as a bilingual comparable corpus...
Motivated by the success of social tagging in web communities, this paper proposes a novel document tagging method more suitable for the enterprise environment, named weighted tagging. The method allows users to tag a document with weighted tags which are then used as an additional source for the query matching and relevance scoring to improve the search results. The method enables a user-driven search...
We analyse the statistical behavior of query-associated quantities in query-logs, namely, the sum and mean of IDF of query terms, otherwise known as query specificity and query mean specificity. We narrow down the possibilities for modeling their distributions to gamma, log-normal, or log-logistic, depending on query length and on whether the sum or the mean is considered. The results have applications...
Set the date range to filter the displayed results. You can set a starting date, ending date or both. You can enter the dates manually or choose them from the calendar.